This report aims to analyze the emotional statuses shown in different philosophers’ texts contained in our Philosophy Data Project dataset. In the dataset, we have 360808 quotes, 13 different schools of philosophy (plato, aristotle, empiricism, rationalism, analytic, continental, phenomenology, german_idealism, communism, capitalism, stoicism, nietzche, feminism), and 36 philosophers (Plato, Aristotle, Locke, Hume, Berkeley, Spinoza, Leibniz, Descartes, Malesbranche, Russell, Moore, Wittgenstein, Lewis, Quine, Popper, Kripke, Foucault, Derrida, Deleuze, Merleau-Ponty, Husserl, Heidegger, Kant, Fichte, Hegel, Marx, Lenin, Smith, Ricardo, Keynes, Epictetus, Marcus Aurelius, Nietzsche, Wollstonecraft, Beauvoir, Davis )
The quotes are subset into words as the emotional analysis is done at word level. The emotional analysis is designed to answer the scientific question “What are those philosophers’ emotional status? (positive vs negative) & how are their emotional status categorized ? & in how many number of clusters can we group those different philosophers based on their emotional status?”
## author school title word
## 1 Plato plato Plato - Complete Works what
## 2 Plato plato Plato - Complete Works new
## 3 Plato plato Plato - Complete Works socrates
## 4 Plato plato Plato - Complete Works to
## 5 Plato plato Plato - Complete Works make
## 6 Plato plato Plato - Complete Works you
In the binary emotional analysis (positive vs negative) below, at both philosopher level and school level, we see that except a few cases, most of the philosophers and schools exhibit evenly balanced emotional (positive vs negative ) words ratio around 0.5.
The exceptional cases include Foucault(Continental), Fichte(german_idealism), Capitalism, Plato.
bing <- get_sentiments("bing")
nrc <- get_sentiments("nrc")
tidy_bing <- df_words %>% inner_join(bing)
# Philosopher Level
tidy_bing %>%
group_by(author) %>%
count(sentiment) %>%
ungroup() %>%
ggplot(aes(n,author, fill = sentiment))+
geom_col(position = "fill") +
geom_text(aes(label = n), position = position_fill(0.5), color = "white")+
theme_dark()+
theme(
legend.position = "bottom",
plot.title = element_text(hjust = 0.5, size = 20, face = "bold")
)+
scale_fill_manual(values = c("#EA181E", "#00B4E8"))+
scale_x_continuous(breaks = scales::pretty_breaks(n = 10))+
labs(y = NULL, x = "ratio", fill = NULL, title = "Negative-Positive Ratio at Philosopher Level")
# School Level
tidy_bing %>%
group_by(school) %>%
count(sentiment) %>%
ungroup() %>%
ggplot(aes(n,school, fill = sentiment))+
geom_col(position = "fill") +
geom_text(aes(label = n), position = position_fill(0.5), color = "white")+
theme_dark()+
theme(
legend.position = "bottom",
plot.title = element_text(hjust = 0.5, size = 20, face = "bold")
)+
scale_fill_manual(values = c("#EA181E", "#00B4E8"))+
scale_x_continuous(breaks = scales::pretty_breaks(n = 10))+
labs(y = NULL, x = "ratio", fill = NULL, title = "Negative-Positive Ratio at School Level")
In the non binary ( more emotional catergories : anger, anticipation, disgust, fear, joy, negative, positive, sadness, surprise, trust ), at both philosopher level and school level, we see that ‘positive’ and ‘trust’ words account for majority of word counts for most philosophers and schools whereas ‘anger’, ‘disgust’, and ‘fear’ words account for the small portion. IEvery philosopher shares similarity in overall ratio of each emotion.
# Philosopher Level
tidy_nrc <- df_words %>% inner_join(nrc)
tidy_nrc %>%
group_by(author) %>%
count(sentiment) %>%
ungroup() %>%
ggplot(aes(n, author, fill = sentiment))+
geom_col(position = "fill")
# School Level
tidy_nrc <- df_words %>% inner_join(nrc)
tidy_nrc %>%
group_by(school) %>%
count(sentiment) %>%
ungroup() %>%
ggplot(aes(n, school, fill = sentiment))+
geom_col(position = "fill")
In the emotional word counts analysis below, the each school’s 100 most frequently used emotional words are plotted in the size proportional to their frequency. As can be seen in the graphs, All of the schools used positive emotional words (e.g “true”, “good”, “money”, “present”, “mother”, “feeling”, “god” most frequently. The only exception is the school continental using “madness” the most frequently.
tidy_nrc_counts<- tidy_nrc %>%
group_by(school) %>%
count(word) %>%
ungroup()
for (school in unique(tidy_nrc_counts$school)) {
df_school_nrc <- tidy_nrc_counts[which(tidy_nrc_counts$school == school),]
wordcloud(df_school_nrc$word, df_school_nrc$n,
main = school,
scale = c(3,0.1),
max.words = 100,
min.freq = 1,
random.order = FALSE,
rot.per = 0.35,
random.color = FALSE,
colors=brewer.pal(12,"Paired"))
}
From now on, we will see how many groups (clusters) we can group the philosophers into based on their emotional statuses using the K-means clustering algorithm. The first steps to create and train the model is to generate a training dataset : the training dataset is designed to have rows represent different authors and columns represent the number of times each emotional word was used.
## anger anticipation disgust fear joy negative positive sadness
## Aristotle 14885 25250 11728 21296 22245 37471 60772 17438
## Beauvoir 5994 9639 4884 7730 10377 14752 21109 6964
## Berkeley 637 1316 321 911 858 1844 3912 673
## Davis 2045 1772 1089 2374 1558 3435 4349 1845
## Deleuze 3857 6933 2679 6036 4124 11199 16680 5153
## Derrida 2187 3245 1148 3144 1967 4734 8402 2348
## surprise trust
## Aristotle 11262 37059
## Beauvoir 3922 12604
## Berkeley 461 2051
## Davis 764 2590
## Deleuze 2801 11266
## Derrida 1293 4622
The second step is to implement data pre-processing : This step invovles row-normalizing each row in the dataset such all the rows have the same range of values across different columns. #2. data pre-processing
library(wordspace)
rows <- rownames(training_df)
cols <- colnames(training_df)
training_df_nu <- apply(training_df, 2, as.numeric)
training_df_normalized <- t(apply(training_df_nu, 1, function(x)(x - min(x)) / (max(x) - min(x))))
train_df <- as.data.frame(training_df_normalized)
rownames(train_df) <- authors
head(train_df)
## anger anticipation disgust fear joy negative
## Aristotle 0.07317714 0.2825288 0.00941224 0.2026661 0.2218340 0.5293678
## Beauvoir 0.12055623 0.3326351 0.05597254 0.2215628 0.3755746 0.6301274
## Berkeley 0.08799777 0.2770816 0.00000000 0.1642996 0.1495405 0.4241158
## Davis 0.35732218 0.2811715 0.09065551 0.4490934 0.2214784 0.7450488
## Deleuze 0.08413685 0.3038354 0.00000000 0.2397686 0.1032069 0.6085280
## Derrida 0.14323132 0.2890819 0.00000000 0.2751585 0.1129032 0.4943479
## positive sadness surprise trust
## Aristotle 1 0.12474248 0.000000000 0.5210463
## Beauvoir 1 0.17699424 0.000000000 0.5051492
## Berkeley 1 0.09802283 0.038986355 0.4817600
## Davis 1 0.30153417 0.000000000 0.5093445
## Deleuze 1 0.17670166 0.008713663 0.6133133
## Derrida 1 0.16542597 0.019988972 0.4789082
The next step is to train a kmeans with different hyper-parameters ( the number of clusters ). Then the average silhouette score is plotted in the graphs below. #3. training (kmeans)
model <- kmeans(train_df, 3)
silhouette_score <- function(k){
km <- kmeans(train_df, centers = k, nstart=25)
ss <- silhouette(km$cluster, dist(train_df))
mean(ss[, 3])
}
k <- 2:10
avg_sil <- sapply(k, silhouette_score)
plot(k, type='b', avg_sil, xlab='Number of clusters', ylab='Average Silhouette Scores', frame=FALSE)
fviz_nbclust(train_df, kmeans, method='silhouette')
As can been checked in the graphs above, the optimal number of cluster is 2 with the average silhouette score is around 0.4. In the following steps, we will see the distibution of philosophers in those two clusters.
As in the graph above, “Nietzsche (continental)”, “Beauvoir (Existentialism)”, “Lenin (communism)”, ’Deleuze(continental)“,”Foucault(continental)“,”Davis(feminism)“,”Epictetus(stoicism)" are grouped into the same cluster (cluster 1) and the others are grouped into the same cluster ( cluster 2 ).
In the following step, the wordcloud graphs are created again to show which emotional words are frequently used by each of the two cluster groups.
In the graphs above, the cluster 1 and the cluster 2 exhibits relatively distinct frequency in their emotional words use. The cluster 2 has the word “death” and “madness” as the most frequently used emotional words, and it also uses some other non-positive emotional words such as “bad”, “evil”, “struggle”, etc fairly frequently. In contrast, the primarily used emotional words of cluster 1 are all positive words such as “good”, “god”, “true”, and “kind”.
In the following step, we can see the two clusters’ ratio of emotional words represented by each emotional category.
Above, as easily expected from the previous analysis, the cluster 1 shows higher “positive” words ratio in its emotional distribution whereas the cluster 2 shows higher ratio of “anger” and “disgust” words.
Thank you